Skip to content

feat(#5352): healthgroups endpoint#5416

Open
ulischulte wants to merge 54 commits into
masterfrom
feature/5352-healthgroups-endpoint
Open

feat(#5352): healthgroups endpoint#5416
ulischulte wants to merge 54 commits into
masterfrom
feature/5352-healthgroups-endpoint

Conversation

@ulischulte

Copy link
Copy Markdown
Contributor

Closes #5352:

Added caching and a dedicated REST endpoint for health groups.

When SBA polls an application's /health endpoint, the response may include a groups field (e.g., ["liveness", "readiness"]). Previously this data was fetched on-the-fly from the client. Now:

  • HealthGroupsCache — A ConcurrentHashMap-based cache stores health groups per instance, updated automatically during each health status poll.
  • GET /instances/{id}/health-groups — A new REST endpoint serves the cached groups directly, avoiding repeated calls to client applications.
  • HealthGroupsCacheCleanupTrigger — Listens for InstanceDeregisteredEvent and evicts stale cache entries to prevent memory leaks.
  • UI integration — The health details view (details-health.vue) now calls the new server-side endpoint instead of fetching groups from the client's health response.

Key design decisions

  • Cache is populated as a side effect of the existing StatusUpdater health polling (no extra HTTP calls)
  • Thread-safe via ConcurrentHashMap + List.copyOf() for immutable snapshots
  • Backward-compatible: the protected convertStatusInfo(ClientResponse) API in StatusUpdater is preserved for subclasses
  • @ConditionalOnMissingBean on all new beans allows customization

@ulischulte ulischulte marked this pull request as ready for review June 5, 2026 15:18
@ulischulte ulischulte requested a review from a team as a code owner June 5, 2026 15:18
* specific data in {@link HealthGroupsCache} on receiving an
* {@link InstanceDeregisteredEvent}.
*/
public class HealthGroupsCacheCleanupTrigger extends AbstractEventHandler<InstanceDeregisteredEvent> {

@cdprete cdprete Jun 6, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit an overkill to me.

It would be enough to invoke the deletion in the StatusUpdater with a doOnNext as it's already done to persist the groups (actually, it's enough to add an if there to check the event type and either call the save or the delete method).
We're scheduling a thread to constantly check for a single even type, which barely happens, where the StatusUpdater is already handling all of them anyway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StatusUpdater polls the /health endpoint — it does not receive InstanceDeregisteredEvent. A deregistered instance simply stops being polled (doUpdateStatus returns early if !instance.isRegistered()), so eviction would never fire there.

I'll investigate a little further and probably will either
(a) keep a lightweight event subscription but reuse an existing handler/scheduler, or
(b) accept the small overhead since cleanup-on-deregister is the established pattern in SBA

@cdprete cdprete Jun 12, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StatusUpdater polls the /health endpoint — it does not receive InstanceDeregisteredEvent.

Indeed. For a moment I had the impression it's handled there because the journal view shows all of them.

I'll investigate a little further and probably will either
(a) keep a lightweight event subscription but reuse an existing handler/scheduler, or
(b) accept the small overhead since cleanup-on-deregister is the established pattern in SBA

Wouldn't be enough to have a Spring Boot's @EventListener for the specific application event and remove the entry from the cache when it's received?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SBA's InstanceDeregisteredEvent is not a Spring ApplicationEvent. It's a domain event emitted through a Reactor Sinks.Many (InstanceEventPublisher) and consumed via Flux.from(Publisher). These events are never published through Spring's ApplicationEventPublisher, so a plain @eventlistener method would never fire.

To still fulfill your request, I implemented a lightweight reactive subscriber bean, no dedicated scheduler, self-contained lifecycle.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To still fulfill your request

You make me look like the bad guy here :)

There are already performance issues around, with memory, so we should try to not add more now, on CPU level.


describe('SSE reactive updates', () => {
it('should call fetchHealth once on mount, not on SSE version changes', async () => {
it('should call fetchCachedHealthGroups once on mount, not on SSE version changes', async () => {

@cdprete cdprete Jun 6, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think we should call the fetchHealthGroups when an SSE is received now, otherwise stale data may be potentially used.

This comment is not specifically related to this test case, but it's a more general statement.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, and agreed in principle. The current else branch only invalidates the per-group details (group.data = null) but keeps the previously fetched group list, so a group added/removed at runtime would indeed show stale entries until the instance id changes or the view is remounted (although this seems to be an edge case).

I'll update onInstanceChanged so the same-instance branch also calls fetchHealthGroups() (which already resets healthGroupOpenStatus/healthGroupLoadingMap and clears stale data), making the displayed group list self-correcting on SSE updates.

@cdprete cdprete Jun 12, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.
Also because, from @SteKoe in a previous PR about a Thymeleaf fix, I discovered that the configuration properties can be changed at runtime from SBA.

Probably, to cover this case as well, we would need SSE also for the health groups, but I don't know how much worthy it really is. Unless a STATUS_CHANGED even is also created when the health response body changes in general and not just by the status (UP, DOWN and so on).
Performance and code clearity wise, it would then indeed be better to update the model of the instance to also have the health groups in it.

@SteKoe SteKoe changed the title Feature/5352 healthgroups endpoint feat(#5352): healthgroups endpoint Jun 12, 2026
this.cache.remove(instanceId);
}
else {
this.cache.put(instanceId, List.copyOf(groups));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should ensure that groups are unique, either here by using a LinkedHashSet instead of a List or in the UI to avoid potentially duplicated entries.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're now deduplicating directly in StatusUpdater.extractAndCacheHealthGroups

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of design, I would have done it here (it's a responsibility of the repository to decide what to store and how), but it's fine also like this for me.

if (Array.isArray(res.data.groups)) {
this.healthGroups = res.data.groups.map((name: string) => ({
if (Array.isArray(res.data)) {
this.healthGroups = res.data.map((name: string) => ({

@cdprete cdprete Jun 17, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ulischulte.

Will this assignment trigger a UI re-rendering?
If it's so, we should maybe check the content of the response first and set the new state (health groups + reset of state of the accorditions) only if there is indeed a change in the collection of the returned group names.
Otherwise, on every fetch, we'll be re-rendering the UI for nothing if nothing really has changed in the context of the health groups.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will trigger a re-rendering of UI. I will add a check to prevent this in terms of UX, as it will also reset the collapsed state. Changing groups in backend will not happen that often, though.

@cdprete cdprete Jun 20, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing groups in backend will not happen that often, though.

I was wondering exactly because their update will be very rare.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetchHealthGroups() now compares incoming vs. current group names and returns early when unchanged, preserving accordion open/loading state.

ulischulte and others added 6 commits June 19, 2026 12:06
# Conflicts:
#	spring-boot-admin-server/src/test/java/de/codecentric/boot/admin/server/services/StatusUpdaterTest.java
…in/server/services/StatusUpdater.java

Co-authored-by: Cosimo Damiano Prete <8491864+cdprete@users.noreply.github.com>
@SteKoe SteKoe force-pushed the feature/5352-healthgroups-endpoint branch from fa4bcf7 to ffe3250 Compare June 19, 2026 10:07
@cdprete

cdprete commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Hi @ulischulte.

What do you think about the idea of adding a metadata boolean property that the applications under monitoring can populate to indicate if they use health groups or not?
In this way, the additional HTTP call can be avoided if we already know in advance that a particular application is anyway not exposing (or not willing to expose) any health group.
If the property is missing, then we default it to true and we try to fetch the health groups anyway.

Moreover, another optimization (to be verified if it's indeed possible; for now it's just an idea) would be to not just return the group names, but the name of the health indicators they include as well. For example: my-group: [db, ldap].
In this way, in the UI, their health can be shown directly - since we already have the health of the single health indicators available as part of the status update event - without having to fetch it explicitly.

I'm still in favour of adding them to the STATUS_UPDATED event, to be honest - especially after the changes done for #5450 - but so be it :)

this.healthGroupLoadingMap = {};
}
// Re-fetch the (server-cached) group list on every instance change
this.fetchHealthGroups();

@cdprete cdprete Jun 20, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do this only if the instance has changed in some meaningful way regarding its status, which means to not do it if the event that gets sent is INFO_UPDATED.

If the process InfoContributor is enabled, the data returned by https://docs.spring.io/spring-boot/api/rest/actuator/info.html changes on every poll, which would mean we're re-fetching the health groups over and over for no real reason.

Also, if the status code didn't change from one event to another but only the details, I think it's meaningless to fetch them again.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just replaced watch: instance with a watch on a computed key = id:status:statusTimestamp. It deliberately excludes version, so INFO_UPDATED events (which bump version every poll with the process InfoContributor) no longer trigger refetches, while genuine status changes still do.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably even id:status is enough.
I don't know if using a less complex key could bring any benefit in terms of perf to the UI or not, though.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ulischulte @SteKoe given that what is getting watched is getting changed here, would this change fix #5482 as well maybe?
Or is this the whole refresh triggered on an higher level?

SteKoe and others added 12 commits July 3, 2026 13:29
…am notifications (#5409)

Co-authored-by: Stephan Köninger <stephan.koeninger@codecentric.de>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
#5472)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
renovate Bot and others added 26 commits July 3, 2026 13:29
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Stephan Köninger <stephan.koeninger@codecentric.de>
…ugin to v1.1.3 (#5495)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Stephan Köninger <stephan.koeninger@codecentric.de>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…5503)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
… prevent duplicate group names and redundant groupname fetches
.exchangeToMono(this::convertStatusInfo)
.exchangeToMono((response) -> this.convertStatusInfo(response, instance.getId()))
.log(log.getName(), Level.FINEST)
.timeout(getTimeoutWithMargin())

@cdprete cdprete Jul 3, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe move this up after the uri "operator"?

Given that the cache is a ConcurrentMap and changing the timeout is not really possible as of today, it would be great to not make Reactor count all the things around the locking and context switching between threads as part of the overall timeout.

@cdprete

cdprete commented Jul 3, 2026 via email

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhancement: Cache health groups and expose endpoint to fetch them

5 participants